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"nonredundant," information. 
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The theory of decoherence has resolved much 

of the decades-old confusion about the transition from 
quantum to classical physics (see articles in 0). R pro- 
vides a mechanism - weak measurement by the environ- 
ment - by which a quantum system can be compelled 
to behave classically. The recent development of quan- 
tum information theory has encouraged an information- 
theoretic view of decoherence, wherein information about 
a central system "leaks out" into the environment, and 
thereby becomes classical @. 

In this paper, we pursue a natural extension of the 
decoherence program, by asking "What happens to the 
information that leaks out of the system?" That infor- 
mation should be sought in the "rest of the universe" 
- i.e., the system's environment. The environment is 
a witness to the system's state, and can serve as a re- 
source for measuring or controlling the system. Our 
particular focus, within this Environment as a Witness 
paradigm, is on how redundantly information about the 
system is recorded in the environment. This is relevant to 
quantum technology; a detailed picture of how decoher- 
ence destroys quantum information may help in designing 
schemes to correct its effects. 

R also illuminates fundamental physics. Massive re- 
dundancy can cause certain information to become ob- 
jective, at the expense of other information. The process 
by which this "fittest" information is propagated through 
the environment, at the expense of incompatible informa- 
tion, is Quantum Darwinism. Two forthcoming papers 
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(0j 0) will investigate the dynamics of quantum Dar- 
winism. 

This paper is focused on the kinematics of information 
storage and the environment-as-a-witness paradigm. R is 
organized as follows. In Section^ we introduce objectiv- 
ity and the "environment as a witness" paradigm, show 
that redundant records indicate objectivity, and propose 
quantitative and qualitative measures of redundancy. In 
Section^ we analyze randomly distributed states, show 
that they do not display redundant information storage, 
and argue that they do not describe the Universe (see 
next paragraph) in which we live. In Section [llll we pro- 
pose singly-branching states as an alternative description, 
and use numerics to demonstrate redundant information 
storage. Section HVI presents an analytical model for the 
numerical results. Finally, we summarize our most im- 
portant results and discuss future work in 

Section El 

We use the word "universe" to denote both (a) ev- 
erything that exists in reality, and (b) a self-contained 
model of a system and its environment. We distinguish 
the two by capitalizing usage (a). Thus, while living in 
the Universe, we simulate assorted universes. 



I. THE ENVIRONMENT AS A WITNESS 

Previous studies of decoherence have focused on the 
system's reduced density matrix (ps), and on master 
equations that describe its evolution. To study infor- 
mation flow into the environment, we require a new 
paradigm. 

We begin with a simple observation: information 
about a system (S) is obtained by measuring its 
environment {£) (see 0,1^]). Although the standard 
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theories of quantum measurement (see e.g. Von Neu- 
mann (lCi |. etc.) presume a direct measurement on the 
system, real experiments rely on indirect measurements. 
As you read this paper, you measure the albedo of the 
page - but actually, your eyes are capturing photons from 
the electromagnetic environment. Information about the 
page is inferred from assumed correlations between text 
and photons. A similar argument holds for every physics 
experiment; the scientist gets information about S by 
capturing and measuring a fragment of £. 

This motivates us to focus on correlations between S 
and individual fragments of £. In particular, we will seek 
to determine whether a particular state - or a particular 
ensemble of states - allows an observer who captures a 
small fragment of £ to deduce the system's state. If so, 
then the system's state is objectively recorded. 



A. Objectivity 

A property - e.g., the state of a system - is objec- 
tive when many independent observers agree about it. 
The observers' independence is crucial. When many sec- 
ondary observers are informed by a single primary ob- 
server, then only the primary observer's opinion is ob- 
jective, not necessarily the property which he observed. 
Independent observers, examining a single quantum sys- 
tem, cannot have agreed on a particular measurement 
basis beforehand. They will generally measure different 
observables - and therefore will not agree afterward. An 
isolated quantum system's state cannot be objective, be- 
cause measurements of noncommuting observables inval- 
idate each other. 

Classical theory, on the other hand, permits observers 
to measure a system without disturbing it. Properties of 
classical systems (e.g., classical states) are thus objective. 
Each observer can record the state in question without 
altering it, and afterward all the observers will agree on 
what they discovered. Of course, observers may obtain 
different information - e.g., one observer may make a 
more effective measurement than another - but not con- 
tradictory information. 

Objectivity provides an excellent criterion for explor- 
ing the emergence of classicality through decoherence. 
A quantum system becomes more classical as its measur- 
able properties become more objective. The use of "mea- 
surable" is significant. Nothing can make every property 
of a quantum system objective, because some observables 
are incompatible with others. Two observers can never 
simultaneously obtain reliable information about incom- 
patible observables (such as position and momentum) of 
the same system. Decoherence partially solves this prob- 
lem by destroying all the observables incompatible with 
a system's pointer observable. We are thus motivated 
to explore (a) how the pointer observable becomes ob- 
jective, and (b) how decoherence and the emergence of 
objectivity are related. 



B. Technical details and assumptions 

This "environment as a witness" paradigm [HI^ Hllll^ 
is ideally suited to exploring objectivity. In order to make 
independent measurements of iS, multiple observers must 
partition the environment into fragments. In this paper, 
we assume that measurements must be made on distinct 
Hilbert spaces in order to be independent, so we divide 
the environment into fragments as 

£ = £a'»£b'^£c'» ■■■■ (1) 

Several factors limit an observer's ability to obtain in- 
formation about S by measuring a fragment of the en- 
vironment {£a)- We can make more or less optimistic 
assumptions about some of these factors, but the de- 
gree of correlation between S and £a is clearly a 
limiting factor. An observer whose particular fragment 
is not correlated with S has no way to obtain information 
about S. That fragment of £ is irrelevant and, for the 
purpose of gaining information about S, might as well 
not exist. The absolute prerequisite for demonstrating 
a property's objectivity is that information about it be 
recorded in many fragments - that is, redundantly. 

We quantify redundancy by counting the number of 
fragments which can provide sufficient information. The 
redundancy of information about some property is a nat- 
ural measure of that property's objectivity [ij. Classi- 
cal properties are objective because information about 
them is recorded with [effectively] infinite redundancy. 
For instance, if we flip a coin, then its final orientation 
is recorded by trillions of scattered photons. Thousands 
of cameras, each capturing a tiny fraction of them, could 
each provide a record. Redundancy is not dependent on 
actual observers. Instead, it is a statement about what 
observers could do, if they existed. 

A pertinent question is "Why not allow an observer 
to measure the system itself?" First, only one observer 
could be allowed to do so without sacrificing indepen- 
dence. Thus, at most, this would increase redundancy 
by 1. Furthermore, an observer with access to the cen- 
tral system could measure it in some weird basis, thus 
destroying its state. Since it's not then clear what the 
information obtained by the other observers would refer 
to, we regard the system itself as off limits to observers. 

C. The overall program 

The work presented here is a natural extension of the 
decoherence program. However, employing the environ- 
ment as a communication channel - not just "sink" for 
information lost to decoherence - is also in a sense "be- 
yond decoherence." It is the next stage in exploring how 
classicality emerges from the quantum substrate. 

In order to fully understand the role that redundancy 
and objectivity play in (1) the emergence of classicality, 
and (2) the destruction of quantum coherence, we'd like 
to answer the following questions: 
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1. Given a state pss for the system and its environ- 
ment (the "universe"), how do we quantify the re- 
dundancy of information (about S) in £? 

2. For a particular "universe," what states are typical 
(that is, hkely to exist)? Do they display redun- 
dancy? If so, how much? 

3. What sorts of (a) initial states, and (b) dynamics 
lead dynamically to redundancy? 

4. Do realistic models of decoherence produce the 
massive redundancy we expect in the classical 
regime? 

5. For complicated systems, with many independent 
properties, how do we distinguish what property a 
bit of information is about? 

6. When information about an observable is redun- 
dantly recorded, is information about incompatible 
observables inaccessible? 

The building blocks of this work - e.g., the reasoning 
presented in this section - have been laid in recent years 
by 0,11, mm El. The first attempt to address items 
(1) and (3) appeared in , and was refined in 01 ; which 
also analyzed a particular simple model of decoherence 
numerically. In the current paper, we answer (1) and (2) 
in detail, and consider (3) briefly. 

D. Computing redundancy 

To compute the redundancy (i?) of some information 
(I), we divide the environment into fragments {£ = 
£a ® £b ® ■ ■ .), and demand that each fragment sup- 
ply X independently. The redundancy of I is the number 
of such fragments into which the environment can be di- 
vided. A generalized GHZ state is a good example: 

h)ss = " \°)s |ooooo...o)£ + 13 |iiiii...i)£ (2) 

We can determine the system's state by measuring any 
sub-environment. Each qubit in £ provides all the avail- 
able information about S (see, however, note [s^). To 
extend this analysis to arbitrary states, we need (a) a 
metric for information, (b) a protocol for dividing the en- 
vironment into fragments, and (c) an idea of how much 
of X is "available" . 




(a) Decoherence Paradi gm: (b) Redundancy Paradig m: 

Universe is divided into Environment is divided 




(c) Subenvironments are combined 
into Fragments that each have 
nearly-complete information. 



FIG. 1: (Color) Three ways to divide up the universe. 

The decoherence paradigm divides the universe into a system 
(5) and an environment {£) as in (a). In the environment- as- 
a-witness paradigm, we further subdivide £ into subenviron- 
ments, as in (b). No sub environment can be further subdi- 
vided, and it is easier to measure one £i than to make a joint 
measurement on several. Fragments are constructed, so as to 
provide enough information to infer the state of S, by com- 
bining subenvironments as in (c). Measurements on distinct 
fragments always commute. 



This is simple to calculate, provides a reliable measure 
of correlation between systerns, and has been used previ- 
ously for this purpose 0, [la 113 • Unlike classical mutual 
information, the QMI between system A and system B 
is not bounded by the entropy of either system. In the 
presence of entanglement, the QMI can be as large as 
Ha + Hb , which reflects the existence of quantum corre- 
lations beyond the classical ones p^.. 



2. Dividing £ into fragments 



1. A metric for information 

We use quantum mutual information (QMI) as an in- 
formation metric. QMI is a generalization of the classi- 
cal mutual information [l5| . Quantum mutual informa- 
tion is defined in terms of the Von Neumann entropy, 
H = — Tr(plogp), as: 

Xa:b = Ha + Hb- Hab (3) 



A pre-existing concept of locality, usually expressed as 
a fixed tensor product structure or as a set of allowable 
structures, is fundamental to redundancy analysis. Al- 
lowing an arbitrary division of £ into fragments would 
make every state where S is entangled with £ (see note 
40]) equivalent (via re-division of f) to a GHZ-like state 
(Eq. [2J|. Decoherence would be equivalent to redundancy. 

The need for a fixed tensor product structure is fa- 
miliar; both decoherence and entanglement are meaning- 
less without a fixed division between the system and its 
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environment (P, see e.g. for a discussion of a 

possible tensor product structures' origins in measurable 
observables; an explanation that does not refer to mea- 
surements would be needed in the present context). In 
the environment-as-a-witness paradigm, we divide £ into 
indivisible suhenvironments: 

£ ^Si®S2®£5® ...En^^^. (4) 

These suhenvironments can be rearranged into larger 
fragments. A generic fragment consisting of m suhen- 
vironments will be written as £{m}- The fragment con- 
taining the particular suhenvironments {£i-^^,£i^, . . 
is denoted £{i^^i2,...i^}- 

We assume that each observer captures a random frag- 
ment of £. This ensures their strict independence. In 
essence, we do not allow the observers to caucus over the 
partition of £, dividing it up in an advantageous way. 



3. How much information is practically available 

The maximum information that could be provided 
about S is its entropy, Hg- In general, no fragment can 
provide all this information 44] . Following the reasoning 
in I ■wfi demand that each fragment provide some large 
fraction, 1 — 5 (where 5 <C 1), of the available informa- 
tion about S. The precise magnitude of the information 
deficit 6 should not be important. We denote the re- 
dundancy of "all but S of the available information" by 
Rs- That is, when we allow a deficit of 5 = 0.1, we are 
computing Rq i or i?io%. 

To compute Rs, we start by defining Ng as the num- 
ber of disjoint fragments £i such that TsXi ^ 
(1 — 6)Is:£- We might just define Rs — Ns, except for 
two caveats. 

1. A large deficit (6) in the definition of "sufficient" in- 
formation could lead to spurious redundancy. Sup- 
pose there exist N = 5 fragments that provide full 
information, li S — 0.5, then we might split each 
fragment in half to obtain Ng — 10 fragments that 
each provide "sufficient" information. To compen- 
sate for this, we replace Ns with (1 — d)Ns. 

2. Because of quantum correlations, T^.e^ can be as 
high as 2Hs. We allow for this by assuming that 
the information provided by one fragment repre- 
sents strictly quantum correlations, and throwing 
this fragment away. This means replacing (l — S)Ns 
with (1 - S)Ns - 1. 

By assuming the worst case, we have obtained a lower 
bound for the true redundancy: 

-Ri > (1 " S)Ns - 1. (5) 

For small S, this is fairly tight, as Ng is clearly an upper 
bound. Since our current toolset, subject to the caveats 
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FIG. 2: (Color) Three profiles for partial information 
plots (X vs. m). (a): the behavior of independent environ- 
ments, (b): information is stored redundantly, (c): informa- 
tion is encoded in multiple environments. 



mentioned above, does not permit a more precise deter- 
mination of Rg, we report the lower bound throughout. 
Thus, when we report "i?io% = 9," we really mean "i?io% 
is at least 9, and not much more." 



E. Identifying qualitative redundancy 

The actual amount of redundancy is often less impor- 
tant than the qualitative observation that information is 
stored very redundantly (e.g., i? ^ 1). Whether R ~ 100 
or R = 1000, the information in question is certainly ob- 
jective - but if i? ~ 1, then its objectivity is in doubt. 
We also wish to consider more general questions: e.g., 
how much does Rg depend on 67 or wtiy does a state 
display virtually no redundancy? 

For these purposes, we plot the amount of informa- 
tion about S supplied by a fragment of size m (I^if^^^j), 
against m. Since there are very many fragments of a 
given size, we average Is-.S^rn} over a representative sam- 
ple of fragments to obtain I{m). The plot of X(m), which 
shows the partial information yielded by a partial envi- 
ronment, is a partial information, plot (PIP). When the 
universe is in a pure state (see |23, and Appendix 
the PIP must be anti-symmetric around its center (see 
Fig. 121). Together with the observation that X{m) must 
be strictly non-decreasing (capturing more of the envi- 
ronment cannot decrease the amount of information ob- 
tained), this permits the three basic profiles shown in 
Figure El 

Redundancy (see Fig. [SJa) is characterized by a rapid 
rise of X at relatively small m, followed by a long "clas- 
sical plateau" . In this region, all the easily available in- 
formation has been obtained. Additional environments 
confirm what is already known, but provide nothing new. 
Only by capturing all the environments can an observer 
manipulate quantum correlations. The power to do so is 
indicated by the sharp rise in X at m ~ Nonv- 
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II. INFORMATION STORAGE IN RANDOM 
STATES 

Redundant information storage is ubiquitous in the 
classical world. We might naively expect that ran- 
domly chosen states of a model universe - e.g., a Dg- 
dimensional system in contact with a bath of A'cnv D£- 
dimensional systems - would display massive redun- 
dancy. To test this hypothesis, we compute partial infor- 
mation plots for random states, and average them over 
the uniform ensemble. This was first done in po| . for 
qubits. In this work, we extend the analysis to systems 
and environments with arbitrary sizes. 



A. The uniform ensemble 

For any [finite] £)-dimensional Hilbert space, there ex- 
ists a unitarily invariant uniform distribution over states, 
usually referred to as Haar measure. We examine the be- 
havior of typical random states by averaging PIPs over 
this uniform ensemble. This average can be obtained an- 
alytically, using a formula for the average entropy of a 
subspace that was conjectured by Page |2j| , then proved 
by Sen ^ and others [Hl^ 

Page's formula [2ll l22l l23l for the mean en- 
tropy H{m, n) of an m-dimensional subsystem of an mn- 
dimensional system (where m < n) is 

ran 

Tfl — 1 

= ■^{mn) -^{n + l) , (7) 

2n 

where the latter expression is given in terms of the 
digamma ^ function. For a Dg-dimensional system in 
contact with iVenv environments of size Dg, the aver- 
age mutual information between the system and m sub- 
environments is 

- H{DsD^\D^''"- (8) 

B. Partial information plots (PIPs) 

Our results (Figs. I3I5|I demonstrate that typical 
states from the uniform ensemble do not display 
redundancy. Figure illustrates typical behavior. 
As an observer captures successively more subenviron- 
ments (increasing m), he gains virtually no information 
about S. Tg.£^^y remains close to zero. When approxi- 
mately 50% of the subenvironments have been captured, 
the observer begins to gain information. I rises rapidly, 
through Hg and onward nearly to 2Hs. 



Information about S is encoded in the environment (as 
in Fig. ^) , much as a classical bit can be encoded in the 
parity of an ancillary bitstring. In the classical example, 
however, every bit of the ancilla must be captured to 
deduce the encoded bit. 

This encoding, or "anti-redundancy" , is related to 
quantum error correction [25ll2^ l28l| . In an encoding 
state, any majority subset of the £i has nearly-complete 
information. The recorded information is unaffected by 
the loss of any minority subset. States with this behav- 
ior can be used as a quantum code to protect against bit 
loss. Our results show that generic states - i.e., states se- 
lected randomly from the whole S£ Hilbert space - form 
a nearly-optimal error-correction code for bit-loss errors. 
Shannon noted similar behavior for classical codewords 

M- 

Figures and ^ extend this result to larger sys- 
tems. The results are consistent; information is still en- 
coded, and only the total amount of encoded information 
changes. 



C. Conclusions 

Our first main result is that typical states selected 
randomly from the uniform ensemble display no 
redundant information storage. Instead, they dis- 
play encoding or anti-redundancy. This is not to say that 
all states are "antiredundant" , merely that redundant in- 
formation storage is rare. As m declines from X{m) 
declines exponentially. For large iVenv, states where in- 
formation is not encoded this way are vanishingly rare. 
If even a small fixed fraction e of states displayed the op- 
posite "redundant" behavior, then T{m) would have to 
be 0(e) at small to. The fact that I{m) is exponentially 
close to zero implies that the fraction of non- "encoding" 
states must decline exponentially with N^nv 

The obvious conclusion is that the Universe does not 
evolve into random states. Our observations of ubiqui- 
tous redundancy in the real Universe are inconsistent 
with the random-state model. This is interesting, but 
not terribly surprising. There is no good reason to ex- 
pect that the Universe's state would be random - we are 
not, for instance, in thermodynamic equilibrium. The 
interactions of systems with their environments must se- 
lect states that are characterized by greater redundancy. 
In the next section, we suggest and analyze such an en- 
semble. 



III. DECOHERENCE AND BRANCHING 
STATES 

Decoherence - the loss of information to the environ- 
ment - is a prerequisite for redundancy. The simplest 
models of decoherence ,30| are essentially identical to 
those for quantum measurements. A set of pointer states 
for the system, {|n)}, are singled out, and the environ- 
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(a) (b) 




2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 

m (No. of subenvironments captured) m (No. of subenvironments captured) 



FIG. 3: (Color) Partial information plots (PIPs) for the uniform ensemble. We plot the average information (X) 
obtainable from a fragment (£■{„}), against the fragment's size (m). I{m) is averaged over all states in the uniform 
ensemble, (a): A qubit system coupled to environments consisting of A^env = 2 . . . 16 qubits. (b): Systems with sizes 
Ds = 2 ... 16 coupled to a 16-qubit environment. Discussion: No significant information is obtained until almost half the 
subenvironments have been captured. Once m > , virtually all possible information (both quantum and classical) is 
available. Because more than half the environment is required to obtain useful information, there is no redundant information 
storage in typical uniformly-distributed states. Instead, the information is encoded throughout the environment. 



(a) (b) 




m (No. of subenvironments captured) captured information capacity (bits) 



FIG. 4: (Color) Equivalent enviroments When the state of the universe is chosen randomly, the environment's Hilbert 
space dimension determines its information-recording properties, (a): PIPs for a 16-d system coupled to several equivalent 
environments with Dtotai = 2^**. The subenvironments are {2, 4, 8, 16}- dimensional, and A^'env is scaled appropriately. The 
plots are essentially identical - only the scaling of the m-axis changes, (b) : The same data, but with the captured fraction of 
the environment plotted on the independent axis. 



ment "measures" which |ri) the system is in, by evolv- 
ing from some initial state (|£o)) into a conditional state. 
If ps is written out in the pointer basis, its diagonal 
elements (pnn) remain unchanged. Coherences between 
different pointer states (e.g., Pnm) are reduced by a de- 
coherence factor: 

Inm. = {£-n.\£m) ■ (9) 

We presume that (a) the subenvironments are initially 
unentangled, (b) each subenvironment "measures" the 
same basis of the system, and (c) the state of the universe 



is pure. In this simple model, the universe is initially in 
a product state: 

|*o) = |5o) ® \£iP) ® ') kr-"') • (10) 

The subenvironments do not interact with each other, 
and the system does not evolve on its own. Letting the 
system's initial state be |5o) = X^n'*"!")' ^^"^ universe 
evolves over time into: 

I*,) =^S„|„)5® |£<i))® |£(^))®...|£("o„v)), (11) 
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FIG. 5: (Color) Scaled versions (SPIPs) of the plots in Fig. |3| SPIPs are useful for comparing environments with 
different numbers of subenvironments, and for computing Rs, the redundancy for a given fraction 1 — 5 of the total information. 
To estimate redundancy, simply draw a horizontal line at /i = ij^, and note the value of /cap where it intersects the PIP. 
This provides a good estimate of l/i?^. It is not a perfect estimate for several reasons; most importantly, the PIP and SPIP 
plot the average T obtained from a given-sized fragment of the environment. This is not the same as the average fragment size 
(m) required to obtain X, since we average the same data over different variables. In these plots, of course, no redundancy is 
evident - we are looking ahead to the next section. 



where is the conditional state into which the jth 

subenvironment evolves if the system is in state \n). Dif- 
ferent conditional states of a given subenvironment will 
not generally be orthogonal to one another, except in 
highly simplified (e.g. C-NOT) models. 



A. The branching-state ensemble 

We refer to the states defined by Eq. ^2 as singly- 
branching states, or simply as branching states. In 
Everett's many- worlds interpretation |8]] |. a branching 
state's wavefunction has Dg branches. Each branch is 
perfectly correlated with a particular pointer state of the 
system. The subenvironments are not entangled with 
each other, only correlated (classically) via the system. 
In contrast, a typical random state from the uniform en- 
semble has -Dunivorsc branches, with a new branching at 
every subsystem. 

In dynamical models of decoherence, the universe at 
a given time will be described by a particular branching 
state that depends on the environment's initial state, and 
on its dynamics. In this paper, we sidestep the difficul- 
ties of specifying these parameters, by considering the en- 
semble of all branching states. We select the conditional 
|£^^') at random from each subenvironment 's uniform en- 
semble. Each pointer state of the system is correlated 
with a randomly chosen product state of all the environ- 
ments. 

The amount of available information is set by the sys- 
tem's initial state (i.e., the s„ coefficients). The eigen- 
values of PS after complete decoherence, which determine 
its maximum entropy, are A„ = |s„p. Since we cannot 



examine all possible states, we focus on maximally "mea- 
surable" generalized Hadamard states: 

Sn = V n. (12) 

To verify that our results are generally valid, we also treat 
(briefly) another class of initial states. 

By examining the branching-state ensemble, we are 
not conjecturing that the Universe is found exclusively 
in branching states. Branching states form an interest- 
ing and physically well-motivated ensemble to explore. 
We shall see that, unlike the uniform ensemble, the 
branching-state ensemble displays redundancy consistent 
with observations of the physical Universe. Our Universe 
might well tend to evolve into similar states, but we are 
not ready to establish such a conjecture. Characteriz- 
ing the states in which the physical Universe (or a frag- 
ment thereof) is found is a substantially more ambitious 
project. 

B. Numerical analysis of branching states 

We begin our exploration of branching states by ex- 
amining typical PIPs, for various systems and environ- 
ments. We average these PIPs over the branching-state 
ensemble, so there are only three adjustable parameters: 
Dsi Ds, and Ncnv- Our results confirm that information 
is stored redundantly. Next, we examine a quantitative 
measure of redundancy {Rs), and its dependence on D5, 
Ds, and N^nv Finally, we derive some analytical ap- 
proximations, compare them with numerical data, and 
discuss the implications of our results. 
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FIG. 6: (Color) PIPs for ensembles of singly-branching 
states. The system is initialized in a Hadamard state, and 
decohered by A'^onv subenvironments. We plot the average 
information (X) available from a collection of m subenviron- 
ments. (a): A qubit is decohered by qubits. (b): A qubit is 
decohered by 5-dimensional subenvironments. (c) : A 5-d sys- 
tem is decohered by qubits. (d): A 5-d system is decohered 
by 5-d subenvironments. Discussion: As A^onv is increased 
from 4 to 12, a "classical plateau" appears. This indicates 
redundant information storage. In the regime m <C Nenv, the 
PIP converges to an asymptotic form. When <S is larger than 
£ (see (c)), the environment is barely sufficient to decohere 
the system, and there is no redundancy (see also Fig. 



1. Partial information plots 

Information is redundant when small fragments yield 
nearly-complete information - that is, when the PIP 
looks like Fig. [SJs. PIPs for branching states (Fig. 
O show exactly this profile. I(m) rises rapidly from 
X(0) = 0, then approaches Hg asymptotically to produce 
a "classical plateau" centered at m = ^^y^. 

As -/Vonv grows, the interesting regimes at m ~ and 
m ~ A''env do not change; the classical plateau simply 
extends to connect them. The initial bits of informa- 
tion that an observer gains about a system are extremely 
useful, but eventually a point of diminishing returns is 
reached, where further information is redundant. The 
degree of redundancy should therefore scale with iVcnv 
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FIG. 7: (Color) PIPs for non-Hadamard states: De = 

2,3,4,5 in plots (a), (b), (c), (d), respectively. The sys- 
tem is 16-dimensional, and initialized in a "thermal" state, 
where s„ oc The entropy of this density matrix is ~ 2 

bits (as opposed to 4 bits for a Ds = 16 Hadamard state). 
We compare the PIPs for "thermal" states with Ds ~ 16 to 
PIPs for Hadamard Ds ~ 4 states, which also develop 2 bits 
of entropy, varying the subenvironments' size. These PIPs 
confirm that our observations apply to non-Hadamard states, 
and that Hs characterizes how information about the system 
is stored. 



The post-decoherence spectrum of ps is non-degenerate 
- in fact, it is exactly that of a thermal spin - i.e., a 
particle with a Hamiltonian H = J^, in equilibrium with 
a bath at finite temperature. We refer to these states as 
"thermal" branching states (and retain quotation marks 
to emphasize that our justification of this nomenclature 
is unphysical). 

Our general approach is to assume that the system's 
maximum entropy determines its informational proper- 
ties. The entropy of a decohered "thermal" state does 
not increase logarithmically with Dg, but asymptotes 
to — 2 bits. This is exactly the entropy of a 
Dg — 4 Hadamard state, so in the limit Dg — > oo, "ther- 
mal" states should behave much the same as a Dg — 4 
Hadamard state. 

This conjecture is confirmed in Fig. \7\ which com- 
pares PIPs for "thermal" states with = 16 to PIPs 
for Hadamard states with Ds = 4. The plots' similarity 
indicates that Hg is the major factor in how informa- 
tion about S is recorded. Further numerical results use 
Hadamard states for specificity's sake. 



2. Non-Hadamard states for S 

Non-Hadamard states provide a different spectrum of 
information for £ to capture. We consider states defined 
by 




3. How PIPs scale with the composition of £ 

As the number of subenvironments in £ grows, com- 
paring PIPs for different environments becomes difficult. 
Re-parameterizing the axes, and plotting the fraction of 
X available from a fraction of £, allows direct comparison 
of different universes. Scaled PIPs (SPIPs) for environ- 
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FIG. 8: (Color) Scaled partial information plots (SPIPS) compare information storage in different environments, (a): 
A qutrit system coupled to JVenv = 4 . . . 128 qutrit environments, (b): A qutrit system coupled to nine different environments 
with the same information capacity. Discussion: As A'^env increases, redundancy (indicated by sharp curvature) grows (plot 
(a)). If Ncnv and Ds are scaled so that total Hilbert space dimension (D^°°^) remains constant, then the SPIP remains 
unchanged (plot (b)). Plot (b) also illustrates the difference between the regime of linear information gain (here, f^ap < 0.04) 
and the exponential convergence to the "classical plateau" thereafter. 



ments with iVonv = 4 ... 128 (Fig. IHt) show that the 
information about S becomes more redundant as iVonv 
grows. 

Different environments, whose total Hilbert space di- 
mensions are the same, act equivalently (see also Sec. 
IIIBII . We have simulated a 16-dimensional system cou- 
pled to nine different, but equivalent, environments (Fig. 
^p). Although the number and size of the subenviron- 
ments are varied, the redundancy of the available infor- 
mation depends only on £'s total information capacity: 
c = log [dim {n)]). Each S in Fig. it has c ~ 120 bits, 
so their SPIPs are essentially identical. 



4- Redundancy: numerical values 

Branching states are natural generalizations of GHZ 
states, so we expect redundant information storage. Fig- 
ure confirms this over a wide range of parameters. The 
amount of redundancy is proportional to the size of the 
environment, which agrees with the classical intuition 
that very large environments should store many copies of 
information about the system. Larger subenvironments 
(measured by Dg) increase redundancy by storing more 
information in each subenvironment. Conversely, larger 
systems have more properties to measure, which in turn 
require more space for information storage. The total 
amount of redundancy is reduced for large Ds . 

The other important feature of the plots in Fig. is 
the relatively weak dependence of Rs on the information 
deficit ((5). As we vary S from 2% to 25% (a full order of 
magnitude), Rs changes by less than a factor of 2. The 
distinction between classical (massively redundant) and 
quantum (nonredundant) information is largely indepen- 



dent of 5. 



IV. THEORETICAL ANALYSIS OF 
BRANCHING STATES 

The numerical analysis in the previous section offers 
compelling evidence that 

1. Information is stored redundantly in branching 

states, 

2. The amount of redundancy scales with N^nv, and 

3. Rs is relatively insensitive to S. 

In this section, we construct theoretical models for PIPs 
and redundancy, which confirm these hypotheses. 



A. Structural properties of branching states 

We begin by using the structure inherent to branching 
states to compute a quantity of fundamental interest, 

= Hs + - Hs£^^y , (14) 

the mutual information between the system and a partial 
environment £{m}- 

We require the entropies of ps, Pf^^j , and 
Tracing over the rest of the universe is simplified by the 
structure that Eq. ^] implies. Each relevant density 
matrix (regardless of its actual dimension) has only Dg 
nonzero eigenvalues. That is, the reduced states for S, 
£{m}, and iS£{m} are all "virtual qudits" with D = Dg- 
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FIG. 9: (Color) Redundancy for an assortment of 
branching-state ensembles. (a): R\o% for a D- 
dimensional system decohered by D-dimensional subenviron- 
ments. (b): Riq% for a 5-dimensional system decohered by 
De = 2 ... 5-dimensional subenvironments. (c): R\o% for a 
Ds = 2 . . . 16-dimensional system decohered by 4-dimensional 
subenvironments. (d): Rs for 5 = 0.001 ... 0.25 and Ds ~ 
De = 5. Discussion: Each plot shows the ensemble-average 
of Rs, as a. function of A^onv. Rs increases linearly with the 
number of environments. Rs increases with De , but decreases 
with Ds. Larger environments store more information, which 
leads to greater redundancy - but larger systems have more 
information to be stored. Information is stored with slightly 
greater efficiency for large Ds and De (plot (a)). Note that 
if S is larger than £ (e.g., Ds — 16 in plot (c)), there may 
be no redundancy. Finally, 5 affects redundancy (plot (d)) - 
but varying (5 by a full order of magnitude (from 2% to 25%) 
changes Rs by less than 50%. 



Each p, when reduced to its Z?5-diniensional support, 
is spectrally equivalent to a partially decohered variant 
of the system's initial state: 



|So)(So| = 



7 ^ ^nS„i 



\n){r. 



(15) 



In other words, we can obtain ps, PSi^y, or psf^^j by 
taking |So)(So| and suppressing the off-diagonal elements 
according to a specific rule. 

To determine this rule, we define (for each subenviron- 
ment) a multiplicative decoherence factor, 7: 

7W (16) 

and an associated additive decoherence factor, d: 



log 7, 



(fc) 



(17) 



Now, j^j"^ quantifies how much £k contributes to decoher- 
ing |i) from \ j). The 7-factors from different £k combine 
multiplicatively; the d-factors provide a convenient addi- 
tive representation. Each relevant density matrix px (for 



I PS W 



(18) 



The d-factor for each subsystem is a sum over d-factors 
for the component £k- 



5Z ""IJ 

E/ ""ij 



(k) 



(k) 



(19) 
(20) 
(21) 



Thus, each p appears to have been decohered by a differ- 
ent subset of £: 

• ps has been decohered by every subenvironment, 

• P5£{„,} has been decohered by all the subenviron- 
ments not in 

• pe^rn} lia^ been decohered by all the subenviron- 
ments in £[rn}- 

Note: If the last point seems counter-intuitive, recall 
that for any bipartite decomposition of |*)^^, the re- 
duced PA and pb are spectrally equivalent. Thus ps^„-l} 
is equal to p^-^-^ — j, where £[m} contains all the environ- 
ments not in fjm}. 

Computing is-.Sm (i^^ terms of the entropy of these 
three states) can be done exactly via numerical diago- 
nalization. For qubit systems, it can also be done ana- 
lytically (see [23 for extensive details). For our model, 
we now derive an approximation for H{p). 



B. Theoretical PIPs: averaging I{m) 

As a particular p is decohered by more and more suben- 
vironments, its off-diagonal elements decline rapidly to- 
ward zero. We will treat the off-diagonal elements of a 
partially decohered state, p = J^ij SiS*^ij \i) as a per- 
turbation around the fully decohered state poj which has 
eigenvalues ~ |sip and entropy Hq. 



1. Average entropy of partially decohered states 

Let p — po + where A is a small off-diagonal 
perturbation to pQ, and expand its entropy as H{p) « 
H(pq) + 0(A). An intuitively appealing starting point 
is the MacLaurin expansion of H(x) = — a;ln(a;), which 
yields 



H{po + A) ^ Hipo)-TT[Ail-Hpo))]-l — + l^ 



2 pQ 6 



(22) 
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The first order term in Eq. 1221 vanishes, because A is 
purely ofF-diagonal and H — ln(p) is purely diagonal. The 

leading term is thus ^ - but the matrix quotient 

is ill-defined when A and po do not commute. 

A more involved expansion of H{p) around p = U (see 
Appendix Q yields a series for H{pQ + A). It is equiv- 
alent to Eq. 1221 for scalars, but for matrices it involves 
(1) expanding p^^ in a power series, and (2) taking a to- 
tally symmetric product between A'^"'"^ and the resulting 
power series. 

To leading order in A, 



H{p) « H{po) 



(Hpo) - 1) , 



(23) 



where |7p is the average of \ over all i ^ j, and /i(po) 
is a nontrivial function, 



Hpo) - E 

n,p—0 



Tr[po(ll-po)P]Tr[po(ll-po)^ 
n+ p + I 



(24) 



3. Average decoherenee factors 

The 7y depend on the details of 4'S8- However, when 
they are small enough to count as a perturbation on p, 
the environment's Hilbert space is very large. The |7ijP 
can then be treated as independent random variables, so 
|7^| is equal to an average over the entire branching state 
ensemble: 



Tr (^X;^ |.0')(^' 



Tr (1111) 



D2 



(28) 



This is the mean value of \"f^\ for a single subenviron- 
ment. For a collection of m subenvironments, m such 



factors are multiplied together, so the mean value of [7^ 
becomes D7"^. 



2. Effective Hilbert space dimension 



4- The result 



In general, h^po) cannot be simplified further. How- 
ever, it is well approximated by the effective Hilbert space 
dimension of po. To see this, we consider the special case 
where po has D identical eigenvalues, Xi — j^. When re- 
duced to its support, Po — ^. The summation can be 
done explicitly: 



Hpo) 



E 

n,p— 
00 

E 

n,p— 

00 

E 

n,p=0 



Tr 



D \^ D 



Tr 



n + p + 1 
{il~D-y){il^D-^)-) 



l-D 



n+p + 1 

-l\n+P 



n+p+1 



E (1 

D 



D 



(25) 



Note that D appeared only based on the eigenvalue spec- 
trum of Po. In the example above, the Hq = H{po) = 
log(D). Since the total range of I(m) is proportional to 
Hq, a logical generalization is 



Hpo) 

Hip) 



Ma 



H{po) - ^ - 1) 



(26) 
(27) 



Numerical experimentation, and an analytic calculation 
in Dg ~ 2, confirm that Eq. 1261 is a good approximation 
everywhere, in addition to being exact for (1) maximally 
mixed states, and (2) pure states. 



Putting this all together, the average entropy of a Ds- 
dimensional system decohered by m I?£-dimensional en- 
vironments is 



Ho^Ho 



1 



(29) 



and the average mutual information between the system 
and m subenvironments is 



.Ho 



X(m) « Ho 
= Ho 



1 



D, 



" - 1 ) sinh 



-(Wo, 



(30) 



2 / 



HDe) 



Equation I3UI is only a good approximation only near 
the classical plateau, where 2 ~ Hq. Around m = 
and m = A^env, ^ rises linearly, not exponentially. Each 
subenvironment can provide only log2 Ds bits of informa- 
tion, so until the information starts to become redundant, 
we're in a different regime (see Fig. ^jp). 

Once the information capacity of the captured envi- 
ronments (mlogZ^g) becomes greater than the amount 
of information in the system (Ho), Eq. I30lbecomes valid. 
It describes the slow approach to "perfect" information 
about the system, as m increases. Figure 1101 compares 
exact (numerical) results for T{m) to the approximation 
in Eq. |2I| 



C. Theoretical redundancy: averaging m(X) 

Branching states develop when each subenvironment 
interacts independently with S. The data in Section 
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FIG. 10: (Color) Numerical PIPs vs. The ory: We com- 
pare the approximation derived in Sec. II V Bl with numerics. 
Error bars on numerics represent typical fluctuations over the 
branching- state ensemble, (a): Ds = Dg — 2, TVenv = 8. (b): 
Ds=De^ 2, A^onv = 32. (c); Ds = De = 4, iVcnv = 8. (d): 
Ds = De = 4, iVenv = 32. (e): Ds = Ds = 16, A^cnv = 8. 
(f): Ds = Ds — 16, A^cnv ~ 32. Discussion: The approx- 
imation is virtually perfect near the classical plateau. For 
small m, the rate of information gain is more nearly linear, 
and the approximation fails. Although it works well at m = 
for Ds ~ 4 (plots (b),(e)), it fails spectacularly near m — 
for large Ds (plots (c),(f)) 



nil B 41 (esp. Fig. O confirm that redundancy in branch- 
ing states is proportional to A'env A certain number of 
subenvironments (ma) is enough to provide sufficient in- 
formation. 

To capture this scaling, we define specific redundancy 

as 



rs = lim 



Rs 



1 



ms 



(31) 



In this section, we use specific redundancy to examine 
precisely how Ds, Dg, and 5 affect information storage 
in branching states. We derive an approximate formula 
for rs, and compare its predictions to numerical data. 

In the previous section, we computed the average in- 
formation yielded by m environments. Now, we compute 
the average m required to achieve a given I. 

When Ncn-v is large, Hse^^^ ~ i?^ ~ iJo, so Is.e, 
He,.. We take Eq. |13 



Hs 



(32) 



as a starting point. For the fragment to provide "suffi- 
cient" information, X — Hs must be less than SHs, which 
requires 



Ds{Ds-l) 



.Hs _ 1) < 26Hs. 



(33) 



Assuming pQ is maximally mixed (i.e., e^" — Ds), and 
replacing the "fij with independent random variables 7„, 
we obtain the following condition on a "sufficiently large" 
fragment: 



Ps(Ps-i) 



< SDsHs 



(34) 



The interaction of \Ds{Ds — 1) independent 7-factors 
makes it difficult to solve Eq. [^rigorously. We begin 
instead by considering a qubit system, which has only 
one off-diagonal 7. 



1. Specific redundancy for qubit systems 

For a single qubit, there is only one decoherence factor: 
doi, which we'll refer to simply as d. Eq. 1341 simplifies 
to: 



d>ds = --log {2SHs) 



(35) 



The increase in d with m can be approximated as a biased 
random walk, where each step has a mean length (c?) and 
a variance (Ad). After m environments are added to the 
fragment, d obeys a normal distribution (pmid)), whose 
mean and variance are md and ^/mAd, respectively. We 
postpone the calculation of d and Ad for the moment. 

Let Psuff (jTi) be the probability that a fragment consist- 
ing of m subenvironments provides sufficient information 
(i.e., satisfies equation 1^ . Then 



Psusim) 



Pra{d)dd, 



(36) 



and the probability that m environments are required is 

Prcq(?™) = Psnsim) ~ Psn«{m - I) (37) 

d 

= / -K-Psns(n)dn, (38) 
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and the expected fragment size (to) is 

oc 

TO = ^ TO Proq(TO) 
m=0 



A 9 . 
TO + - -— psuff (TOjtiri 
2 / am 

TO-^Psuff(TO)(i7l 

am 

lO 

(1 -Psuff(TO)) f*n 



dm / pra{d)(ii. 



(39) 



We interchange the order of integration, substitute the 
appropriate normal distribution for pm{d), and end up 
with 



_ d5 A 
m = — H 

d 



2d 



(40) 



we calculate: 



d 

Ad^ 



(*(2?£)+7,m), 



1 

2 

^_ ^lODg) 
24 4 ^ 



(44) 
(45) 



in terms of the digamma {'^(n)) and trigamma (^'i(n) 
functions [s^ |3^, and the Euler-Mascheroni constant 
7em = 0.577.... These functions may not be familiar 
to all readers, so we present the first few values in Table 

in 



De 


2 


3 


4 


5 


6 


8 


d 


1 

2 


3 
4 


11 

12 


25 
24 


137 
120 


363 
280 


Ad 


1 


Vs 


7 


V2Us 


\/5269 


V266681 


2 


4 


12 


24 


120 


840 



TABLE I: The table shows the first few values of d and Ad, 
for environments of size Ds £ [2,3,4,5,6,8]. See Appendix 
|n|for details on the calculation. 



For larger D^, we can safely approximate Eqs. I44I45I 



as: 



2. Specific redundancy for general Ds 

Whereas Eq. |211(for qubits) has one |7p term, Eq. |21 
involves a sum of ^Ds{Ds + 1) such terms. Deriving an 
analyzing a probability distribution for this sum is very 
difficult, so we take a simpler route. We replace the sum 
over terms with a single term, s{D s + ^) ■ 1^ ^ where 7^ 
represents all the off-diagonal terms. The new condition 
for sufficient information is: 



< SDsHs 
Ds-l 



< 



d > dx 



1 



■log 



26Hs 



(41) 



Dg has been incorporated into a redefinition of dg. Equa- 
tion ^U] is still valid for qubits, but it generalizes to 

^^log(i^.-l)-log(2^if.) 1^ (42) 

We combine this expression with Eq. |2] to obtain a 
general estimate for specific redundancy: 



rs = 



2/(1 -<5) 



A^+d +d ilog{Ds - 1) - log{2dHs)) 



(43) 



3. Dependence of mean decoherence factor (d) on De 

The computation of d and Ad in terms of is some- 
what tedious. Details can be found in Appendix^ where 



d ~ 2 (^°S(£'£)+7em) 



Ad 



24 



(46) 
(47) 



4. How good is the estimate? 

In Figure [TTl we compare numerical results to the ap- 
proximation of Eq. 1431 The analytical estimate is very 
good for qubit systems, but loses some fidelity for larger 
Dg. A more sophisticated treatment of the multiple 
7.y terms - each representing an independent observable 
which the environment must record ~ would eliminate 
this error. 

To get an intuitive feel for the dependence of rg on 
its parameters, we consider the regime of large systems, 
large environments, and small deficit - i.e., Hq ^ 1, c? ^ 
i log(£'£). Ad ~ and S I. In this regime, we can 
ruthlessly simplify Eq. 1431 to obtain a simple prediction: 



rs 



log{Ds)-log{S)- 



(48) 



The plots in Fig. 1 121 show the ratio between numerical rg 
data and the simple predictions of Eg 1481 They confirm 
that Eq0H|is a good rule of thumb. 

Eq. can be interpreted as a capsule summary of 
how redundancy scales in the "random-state" model of 
decoherence. 

1. Redundancy is proportional to iVonv, the number 
of independent subenvironments. More environ- 
ments produce more redundancy. 
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FIG. 11: (Color) Specific redundancy (rs = Rs/Ncnv)'- numerical data (symbols) compared with theory (Eq. 1431 solid lines), 
(a): rg vs. S, for a 16-d system coupled to 2, 3, 4, 8-dimensional subenvironments. (b): rg vs. 5, for 2, 3, 4, 8, 16-d systems 
coupled to qubit subenvironments. (c): ri% vs. De- (d): ri% vs. Ds- Discussion: Theory predicts the overall behavior 
of redundancy well. It is nearly perfect for Ds — 2, but overestimates r for larger systems. As 5 increases, rg saturates and 
even declines because of the (1 — S) prefactor in Eq. |K| When S is large, the theory breaks down (see (a)), because a single 
sub environment can provide sufflcient information. 



2. Redundancy is proportional to d, the mean deco- 
herence factor of a single subenvironment, which 
grows as \ogDg. Larger environments produce 
more redundancy, in proportion to their in- 
formation capacity. 



3. Redundancy is (roughly) inversely proportional to 
iJ^, the total information available about the sys- 
tem. Larger systems require more space in 
the environment. 



4. The deficit (5) appears as a logarithmic addition 
to Hs- Reducing the amount of "ignorable" infor- 
mation is equivalent to making the system bigger. 
Redundancy depends only weakly (logarith- 
mically) on the deficit, S. 



V. CONCLUSIONS AND DISCUSSION 

'There is no information without representation': in- 
formation has to be stored somewhere. To retrieve it, 
we must measure the systems where it is stored. To un- 
derstand the properties of information, we look at the 
properties of this retrieval process. We have focused on 
the question: How easily can information about a 
system be retrieved from its environment? 

The answer is strongly dependent on how the system 
became correlated with its environment. Random inter- 
actions between S and all of £ leave no useful correlations 
- to learn about S we must measure most of £. However, 
when localized parts of £ interact independently with S, 
an observer can learn about S by measuring a small frag- 
ment of £. Furthermore, the information that he learns 
is objective - another independent observer will arrive at 
the same conclusions. 
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FIG. 12: (Color) "Efficiency": specific redundancy 
rescaled by information capacity. Equation 1481 provides 
a simple approximation for redundancy, based on the rela- 
tive information capacity of the system (with a correction for 
S) and its environment. We reproduce the data of Fig. 1111 
but use Eq. 1431 to rescale specific redundancy. Discussion: 
Efficiency is consistently near to 1: when the universe is in 
a random branching state, information about S is efficiently 
recorded in £. Equation 1431 is accurate for large Ds and Dg 
(and small S). When the system or the subenvironments are 
small, Eq. 1431 underestimates information storage efficiency. 

This redundant imprinting of selected observables on 
the environment is quantum Darwinism. It leads to ob- 
jective reality in a quantum Universe. Typical PIPs for 
branching states (see Fig. I13|) illustrate how different 
sorts of information are selected or deprecated. The in- 
formation in £ about S divides naturally into three parts. 

Is:£ =Ir+Inr+Iq- (49) 

The redundant information (Ir) is classical - it can be 
obtained easily, by many independent observers. Its se- 
lective proliferation is the essence of quantum Darwinism. 
OUivier et. al. showed, in j^J, that 2r is not only easy 
to obtain, but difficult to ignore. An observer who suc- 
ceeds in extracting Xr, and continues to probe, finds a 
"classical plateau". Measurements on additional suben- 
vironments increase his knowledge of S only slightly - 
mostly, they only confirm what he already knows. Only 
a perfect and global measurement of everything can reveal 
more than the redundant information. 

Purely quantum information (Iq) represents observ- 
ables that are incompatible with the pointer observable. 
This is the information that quantum Darwinism selects 
against. It is (a) encoded amongst the environments, 
much as a classical bit can be encoded in the parity of 
many ancilla bits; (b) accessible only through a global 
measurement on all of £; and (c) easily destroyed when 
£ decoheres. 




20 40 60 80 100 
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FIG. 13: (Color) Quantum Darwinism selects certain ob- 
servable properties of the system and propagates information 
about them throughout the environment. The preferred ob- 
servable [s] become redundant at the expense of incompat- 
ible observables. As shown here, PIPs illustrate the results 
of Quantum Darwinism. Information about 5 becomes di- 
vided into three parts: redundant information (Xr), quantum 
information (Xq), and non-redundant information (Xnr). Re- 
dundant information is objective, and therefore classical. It 
can be obtained with relative ease. Quantum information rep- 
resents the non-preferred observables, marginalized by Quan- 
tum Darwinism, which can only be measured by capturing all 
of £. Non-redundant information (determined by the slope 
of X(m) at m — ^"^'^ ) represents the ambiguous borderline, 
undifferentiated as yet into classical and quantum fractions. 
When Xnr is small, the central region of the PIP becomes flat. 
This "classical plateau" indicates that an observer can obtain 
full information without capturing the entire environment. 

Finally, non-redundant information (Inr) represents a 
grey area - the border between the classical and quantum 
domains. It exists only when the classical plateau in 
X{m) has a nonzero slope. This is why we allow for a 
deficit ((5) when computing redundancy. 

Information storage in randomly selected arbitrary 
states of the model universe is dramatically different 
from information storage in randomly selected singly- 
branching states. The contrast between these two cases 
emphasizes the importance of the environment's struc- 
ture. Overly simple thermodynamic arguments (e.g., 
maximum entropy in absence of gravity) indicate that 
the physical Universe should evolve into states that are 
uniformly distributed. Our results, however, show that 
objects which display the redundancy characteristic of 
our Universe must have structured correlations with their 
environments. 

Decoherence theory emphasizes the role of the envi- 
ronment in the quantum-to-classical transition, but only 
as a reservoir where unwanted quantum superpositions 
and correlations can be hidden, out of sight. Even this 
view - which now seems somewhat narrow - has pro- 
duced important advances in our understanding over the 
past quarter century. Examples include einselection, the 
special role of pointer states, and the view of classical- 
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ity as an emergent phenomenon. Nevertheless, it is clear 
from our discussion above and from related recent work 
[Til IT^ , that "tracing out £" obscures crucial aspects of 
the environment's role. 

The environment is a witness - a communication 
channel through which observers acquire the vast ma- 
jority (if not all) of their information about the Uni- 
verse. Surprisingly, this realization has taken more than 
75 years since the formulation of quantum mechanics in 
its present form. It goes against a strong classical tradi- 
tion of looking for solutions of fundamental problems in 
isolated settings. This tradition is incompatible with the 
role of states in quantum theory. 

Quantum states, unlike classical states, do not define 
what "exists objectively" . They are too malleable - too 
easily perturbed and redefined by measurements. More- 
over, in quantum mechanics, what is known about a 
system's state is inextricably intertwined with what it 
is. Classical states, in contrast, have existence indepen- 
dently of the knowledge of them. To put it tersely (and 
in the spirit of complementarity), quantum states play 
both ontic (describing what is) and epistemic (describing 
what is known to be) roles[4||. Thus, for many purposes, 
it makes no sense to talk about a state of a completely 
isolated quantum system. 

Our Universe is 'quantum to the core' (see e.g. Ref. 
|3?t| for an up-to-date review of the experimental evi- 
dence), so the only place to look for objective classicality 
is within the quantum theory itself. Decoherence has 
certainly supplied part of the answer: Only some of the 
states in an open system's Hilbert space are stable. Those 
that are not stable, cannot "exist objectively". Even 
these einselected pointer states, however, are vulnerable 
to perturbation by an observer who measures directly. 
Yet, objectivity implies that many different (and initially 
ignorant) observers can independently find out the state. 

The environment-as-a-witness point of view solves this 
problem by recognizing that we gain essentially all of our 
information indirectly, from the environmental degrees of 
freedom (with the possible exception of specific labora- 
tory experiments). As the environment is the "channel", 
and as only a part of it can be intercepted, the obvious 
question is: How is information is deposited in £? 
and what kind of information? 

Quantum Darwinism, which we have begun to anal- 
yse here and elsewhere [l|, la, [ill j aims to supply the 
answer. Our basic conclusion is that the redundancy ev- 
ident in our Universe is not a generic property of ran- 
domly selected states in large multipartite (system plus 
multi-component environment) Hilbert spaces. However, 
when states in that Hilbert space are created by the in- 
teractions usually invoked in discussions of environment- 
induced superselection, redundancy appears. Thus, ob- 
jectivity can arise through the dynamics of decoherence. 
In that sense, decoherence is the mechanism that delivers 
quantum Darwinism - a more complete view of classical- 
ity's emergence. 

While we have already witnessed the birth of this new 



point of view, it is still far from mature. In particular, 
our conclusion about redundancy and the typical struc- 
ture of entanglement was reached without analyzing dy- 
namics per se. We have laid the foundation for a full- 
fledged study of quantum Darwinism by analysing kine- 
matic properties of states, and postponed the study of 
evolution in specific models to forthcoming publications 
[3, [3- Moreover, by employing von Neumann entropy, 
we have focused on the amount of information (rather 
than on what this information is about). Differences be- 
tween various definitions of mutual information exist (see 
"discord" , Ref. 18]), and are symptomatic of the "quan- 
tumness" of the underlying correlations. Less "quan- 
tum" definitions of mutual information, involving con- 
ditional information, de facto presume a measurement. 
They have also been used [3,[i3j[l3)i along with other 
tools ([l^ll^), to show that the familar pointer observ- 
ables are the "fittest" in the (quantum) Darwinian sense. 
Studying the dynamics of quantum Darwinism, and the 
connections with various definitions of information, are 
the obvious next steps. 
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APPENDIX A: PROPERTIES OF QMI: THE 
SYMMETRY THEOREM 



The symmetry theorem for QMI is important for 
understanding the shape of PIPs (partial information 
plots). It says, in essence, that the amount of informa- 
tion that can be gained from the first few environments 
to be captured, is mirrored by the amount of informa- 
tion that can be gained from the last few environments. 
Thus, when capturing a small fraction of £ yields much 
information, an equivalent amount of information cannot 
be gained without capturing the last outstanding bits of 
£. 

Theorem 1 (Mutual Information Symmetry The- 
orem). Let the universe be in a pure state IV')^^, o,nd 
let the environment £ be partitioned into two chunks £a 
and £b- Then the total mutual information between the 
system and its environment is equal to the sum of the 
mutual informations between S and £a and between S 
and £b: that is, Ig-.e = ^s-.Sa + ^s-.Sb ■ 

Proof. We simply expand each mutual information as 
^x:y — Hx -\- Hy — Hxy, and use the fact that if a bipartite 
system x ® y has a pure state \ 4>)xyi then the entropies 
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of the parts are equal; = Hy. 

^S:£a + ^S:£b = Hs + Ha — HsA + Hs + Hb — Hsb 
= Hs + Ha — Hb + Hs + Hb — Ha 
= Hs + Hs 
= Hs + Hab-0 

— 1S:£ 

Corollary 1. Under no circumstances can two sub- 
environments both have X > Hs information about the 
system. 

If the universe is in a pure state, then the Symmetry 
Theorem states that any bipartite division of the envi- 
ronment will yield two chunks, at least one of which has 
I < Hs- Additionally, we note that a chunk has at least 
as much T about the system as any of its sub-chunks 
(that is, decreasing the size of a chunk cannot increase 
its X) . If we could find two chunks A and B with T > H^ , 
then by subsuming the remainder of £ into A we would 
have a bipartite division into A' and B, each of which has 
X > Hs - but this contradicts the Symmetry Theorem. 

The proof for a mixed state of the universe follows from 
the "Church of the Larger Hilbert Space" argument. We 
purify pse by enlarging the environment from £ to £' , 
and follow the same steps to show that £' cannot have 
two subenvironments with X > Hs- Since f is a subset 
of £' , it too cannot have two such subenvironments. 

Corollary 2. For a pure state \i')s£ of the universe, 
the partial information plot (PIP) must be antisymmetric 
around the point (m — = Hs). 

This follows straightforwardly from the Symmetry 
Theorem. For each chunk S^m] of the environment that 
contains m individual environments, there exists a com- 
plementary chunk £{7v-m}: containing the complement 
of £{m}j with N — m individual environments. The 
Symmetry Theorem implies that Xs.s^^y + l^s-.e^M-mi ~ 
Is:£ = SiJg. By averaging this equation over all pos- 
sible chunks £{m}^ we obtain an equation for the PIP: 
I{m) +T{N ~ m) — 2Hs- This equation is equivalent to 
the stated Corollary. 

APPENDIX B: PERFECT STATES 

The primary intuition that we obtain from the X{m) 
plots is that most states are "encoding" states, but an 
important sub-ensemble of states are "redundant" states. 
We are naturally led to ask whether "perfect" examples 
of each type of state exist - that is, a state that encodes 
information more redundantly than any other state, or a 
state that hides the encoded information better than any 
other state. 

The answer is somewhat surprising: whereas perfectly 
redundant states exist for any N and any Ds , -Df , perfect 
coding states apparently exist only for certain N (at least 



for Ds = Dg = 2). The perfectly redundant states are 
easy to understand; they are the generalized GHZ (and 
GHZ-like) states of the form: 

\^se)=a\o)s(^\o),^+f3\i)^(^\i),^, (Bl) 

i i 

with the obvious generalizations to higher Ds^Dg. Of 
course, it's necessary that Dg > Ds- 

A true GHZ state is invariant under interchange of 
any two subsystems; however, since mutual information 
is invariant under local unitaries, we only require that the 
states lo)^. and li)^. be orthogonal. Clearly, such states 
exist for all N. Any sub-environment with < m < 
N has exactly H{S) information, but only by capturing 
the entire environment (m = N) can we obtain the full 
T — 2H{S). Thus, the information is stored with A^-fold 
redundancy. 

A perfect coding state, on the other hand, would be one 
where X{m) = for any m < N/2, and X{m) = Xs£ for 
m > N/2. An equivalent condition, for qubit universes, is 
the existence of two orthogonal states of N qubits, each of 
which is maximally entangled under all possible bipartite 
divisions. If such pairs of states exist, then the system 
states |o) and |i) can be correlated with them to produce 
the perfect coding state. It is known (as detailed in |23|) 
that such states only exist for = 2, 3, 5, 6, and possibly 
for N = 7 (for A^ = 6, only a single state exists 38]). 
Thus, while for large A^ almost every state is an excellent 
coding state, perfect examples seem not to exist except 
for A^ = 2, 3, 5, (7?)! We are not aware of any results for 
non-qubit systems. 

APPENDIX C: ENTROPY OF A 
NEAR-DIAGONAL DENSITY MATRIX 

Suppose that the pure state tt = |V')('/'|, whose compo- 
nents in the pointer basis are 

= Si, (CI) 

is subjected to decoherence. The off-diagonal elements 
are reduced according to 

TTij — > = 7ij7rij, (C2) 

where 7^^^ = 1 for all i. The limiting point of the process, 
where jij — for all i 7^ j, is p: 

Pi.j = ^tj\St\'^- (C3) 

As the 7i J- approach zero, a converges to p. The par- 
tially decohered a can be written as 

cr = p + A, (C4) 

where A is strictly off-diagonal. A is defined by 

Aij = (1 - S^j) JijSiS*. (C5) 
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As (J approaches p, its entropy approaches the entropy of 
p. Our goal here is to write H[<t) as a power series (in 
A) around H{p). 
The entropy of a is 



H(a) = -Tr(CThio-) = Tr [H{a)j (C6) 

where 

H{(j) = -a\iYa. (C7) 
The difference between H{a) and H{p) is 

5H = Tr(<57J) = Tr (h{p + A) - H{p)] . (C8) 



We will seek a power series for 5H . Keeping in mind that 
its trace is the relevant quantity, we will discard traceless 
terms. 



1. A naive approach to expanding H{p + /S.) 

It's tempting to begin by expanding Eq. I('7I around 
a — p. Using the MacLaurin series for —ulna gives 



H = -A(ll + ln/?) - ^ 



n=0 



{n + l)(n + 2) p"+ 



An+2 

-(C9) 



A2 
2p 



A^ 
6p2 



(CIO) 



We discarded the first term because it is traceless. Unfor- 
tunately, matrix quotients are not well-defined. ^ could 

mean either Ap^^ or p^^A - and, in fact, both are non- 
symmetric and therefore incorrect. Other symmetric or- 
derings, such as p^aAp^s, also give incorrect results. 
The expansion in Eq. KHOI is an inappropriate general- 
ization of a scalar expansion, and is ill-defined. We will 
take a different approach which (a) gives the correct re- 
sult, and (b) defines the correct representation of matrix 
quotients. 



2. The correct approach 

Instead of expanding H{(t) around a — p, we expand 
both H{a) and H{p) around the identity. 

5H = H{p + A) - H{p) 

= i?(ll-(ll-p-A))-i/(ll-(ll-p)). 



The expansion around H is always well-defined, because 
U and its inverse commute with everything: 



°° ™n+2 

#(11 - x) ^ x - y - — ^— — -. 

n-l- 1 n + 2 
Using this expansion in 5H yields 



(CU) 



5H ^ ~A + ^^ ^ ^\^^ (C12) 



n=0 



(n + l)(n + 2) 



We once again discard A because it is traceless, leaving 
only the sum. The two matrix powers within the sum 
can be rewritten using the identity 



(C13) 



which yields 

oo n+2 



n + 2 



71=0 J = \ J / 

In order to simplify this, we must introduce a new no- 
tation. Consider [x + yY, where x and y may be either 
scalars or matrices. For scalar x and y, 



k=0 



(C15) 



whereas for matrices, {^x^y'P~^ is replaced by a sum 
over (^) ordcrings of k s's and p — k y's. We define the 
notation x'^ o y^^^ to describe this sum: e.g., 

2 2 a;2y2 _(_ xyxy + xy^x -f yx^y + yxyx + y^x 

X oy = 

y 6 

(C16) 

but when x and y are scalars 



2 2 2 2 

X oy = X y . 



(C17) 



Using this definition of a totally symmetric product^ 



(p+A)^ ^x^i^y^'"/^""' 



and the entropy difference operator 5H is 
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6H = - 



n=0 j 
oo oo n 

EEE 

k^l n^O j^O 
oo n+1 



n + 2 



oo n+2j~l (—1)-' 



{n + k){n + k + l)\i + k + l 



fc+ 1 



-EE 

n=0 j=0 



— ^— j + 1 AopJ 
(n + l)(n + 2) Vj + 1/ 



(C19) 
(C20) 
(C21) 



The fc = term can be discarded because Tr(A o p>) — Tr(Ap^) = 0. Wc then perform the sum over j to obtain 



5H 



OO OO 



EE 

k=l n=0 



(-1)* 



(n + fc)(n + fc + 1) V fc + 1 



n + A: + 1 



.fc+i 



o(ll-p)". 



(C22) 



r 



Expanding the binomial coefficients and simphfying 
leads to the following result: 

(C23) 

We have come full circle. The sum over n in Eq. IC23I 
is just the MacLaurin expansion for around /? = U. 
Equation IC23I can thus be written symbolically as 



(-1)" 
^ k{k+i) 



A" 



(C24) 



if the symmetric product A'^^^ o p^'^ is interpreted as 
"take the symmetric product of A'^^^ with the power 
series representing p~'^." 

Essentially, what we have derived is the "correct" in- 
terpretation of the matrix quotient . This result is 
interesting in its own right, but for now we are interested 
only in the leading order (i.e., A^) term. Truncating the 
series at /c = 1, we obtain the following simple result: 



SH 



hf J2 Tr [A^ o (11 - pT] + O (A^ 



(C25) 



This is the simplest possible general form for SH. In 
order to perform the traces, we need to take advantage 
of the form of the symmetric product. 



From the definition of the symmetric product, we can 
write out explicit expressions for A*^ o M", for particular 
small values of k. 



1 " 

A o M" = y MPAW-P (C26) 

p=0 



n n—p 



A^ o A/" = 



V V M^AAFAM"- 

n + 1 n + 2 

^ ^ p=0 q=0 



p~q 



(C27) 

The second case (for A^) is the useful one. We need the 
trace of the symmetric product, which can be simplified 
using the cyclic property of trace. 



1 " 

Tr [A^ o M"] = ^ Tr [AAF AM"-^] . (C28) 



Together with Eq. IC25I this formula yields an explicit 
expression for 5H: 



oo 

- - 2 E ;m ^„ ^^^^ ' '^'^^^ ^ 

(C29) 



n=0 p=0 
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We now insert specific forms for p and A, from Eqs. IC3l and [C5l 



Tr[AMPAM"-P] = £ A,,(ll - p)^\A,,(l - p)?."" 

Ds-1 

= X! SiSjSkSijijjkiSjkSil {1 - Isjl'^Y (1 - \si\'^y ^ 

i,j,k,l=0 

Since tlie goal is to average over an ensemble of states, we replace |7ijP with an average, |7p, 
Tr TAMTAM"-] ^ w\^^ ^'^^'^ ' ^'^^'^'^ ('^^"'^ " 

_[-E.(k-iMi-k-n") 

= |7P[Tr[p(ll-p)f]Tr[p(ll-p)"-P] -Tr[p2(ll-p)»]] 
Inserting this expression into Eq. I('29l vields 

« E ;r-T E - P)1 Tr [p(ll - - Tr [^^(11 - p)"]] 

n=0 p=a 



(C30) 

(C31) 
(C32) 



(C33) 



(C34) 



Finally, we can simplify this expression slightly by (1) taking advantage of the identity ^ (11 — p)' 

n=0 

rearranging the summation variables. 



SH 



hi 

2 

w 

2 

w 

2 



CO CO 



^^ Tr[p(ll-p)P]Tr[p(ll-p)"] 



,n=0 p=0 
oo oo 



n + p + 1 



E(^-^)" 



n=0 



y;y; Tr[p(ll-p)P]Tr[p(ll-p)"] ^ 



_n=0 p=0 



Equation IC37I is the simplest form we have been able to achieve, except in very special cases, for H{p - 

I 



p-^, and (2) 

(C35) 
(C36) 
(C37) 
-A)-g(p)- 



APPENDIX D: PROBABILITY DISTRIBUTIONS 
FOR ADDITIVE DECOHERENCE FACTORS 

If \tp) and \ip') are selected at random from the uniform 
ensemble of Dg-dimensional quantum states, then the 
probability that | {ii\4>') \ = 7 (for 7 € [0 ... 1]) is 

p(7) = 2(i?f:-l)7(l-7')''^"' (Dl) 

The additive decoherence factor d is given by d = 
— log(7), so that 7 = e"'^ and d G [0 ... 00]. The proba- 
bility distribution transforms as 



p{d)dd 
p{d) 



p(7)d7 



P{l) 



'Ml) 



l)e 



-2d 



(1 



e-^Y'^' (D2) 



The decoherence factor for a collection of subenviron- 
ments is simply the sum of d(^) over the contributing 
subenvironments. Ideally, we could obtain exact distri- 
butions Pm{d) for a sum of m such c?-factors. For an 
environment composed of qubits (Df — 2), p{d) is a 1st- 
order Poisson distribution, so p„i (d) is just the mth order 
Poisson distribution (for details, sec 20]). 

For larger subenvironments {Dg > 2), no such sim- 
ple description exists. However, the distribution func- 
tions p{d) are well-approximated by Gaussian distribu- 
tions. We can treat the summing problem as a biased 
random walk, where the addition of another subenviron- 
mcnt represents a step forward with an approximately 
Gaussian-distributed stepsize. 

To compute the mean and variance of an m-step ran- 
dom walk, we first compute the mean value d and vari- 
ance Ad — \ (d"^ ^ d. ] for a single subenvironment. Ex- 
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trapolating to a collection of m systems requires setting 
dm ~ md and Adm — \/mAd. 

For a single subenvironment, the mean d is given by 
d = dp{d)dd. This integral is somewhat nontrivial, 
involving an expansion in binomial coefficients: 



d 



1{D£ - 1) / de-^'' (1 - e-2<^) 
Jo 



= 2{l-De)fde-^'^''pl^\'^ (e-'=^)cti 
{-f{Ds-2)\ 



De-1 



k=0 
De-2 



E 



2 ^ {k + l)m{D£-2-k)\ 



where ^{Dg) is the dig amma function, and — 
0.5772 ... is the Euler-Mascheroni constant. A virtually 
identical calculation for yields 



Ad^ = ^--^1^ 
24 4 



(D4) 



(D3) in terms of the trigamma function ^i{D£). 
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